Skip to content

feat: add DSA cache and PP support#134

Merged
feiqiangs merged 1 commit intotaco-project:feat/layerwise_rebasefrom
zhjc1124:feat/layerwise_rebase
Apr 6, 2026
Merged

feat: add DSA cache and PP support#134
feiqiangs merged 1 commit intotaco-project:feat/layerwise_rebasefrom
zhjc1124:feat/layerwise_rebase

Conversation

@zhjc1124
Copy link
Copy Markdown
Contributor

@zhjc1124 zhjc1124 commented Apr 2, 2026

Summary

This PR adds DSA (Dynamic Sparse Attention) cache support and Pipeline Parallelism (PP) support to FlexKV.

Changes

New Features

DSA/NSA Indexer Cache Support

  • Added dataclass in to hold indexer-specific cache configuration (e.g., , , ) for DSA/NSA sparse attention models
  • Extended to manage separate indexer storage handles () for CPU, SSD, and REMOTE devices
  • Extended to accept optional indexer GPU blocks

Pipeline Parallelism (PP) Support

  • Added parameter to so each PP rank only manages its own layers instead of the full model layer count
  • Fixed resolution to use total heads (not per-rank heads) for correct KV layout

@YconquestY YconquestY self-requested a review April 2, 2026 08:57
@zhjc1124 zhjc1124 force-pushed the feat/layerwise_rebase branch 8 times, most recently from 619f422 to d5033a7 Compare April 5, 2026 04:30
@zhjc1124 zhjc1124 force-pushed the feat/layerwise_rebase branch from f8ca05d to 57045c3 Compare April 5, 2026 09:08
@feiqiangs feiqiangs self-requested a review April 6, 2026 02:41
@feiqiangs feiqiangs merged commit 72c5187 into taco-project:feat/layerwise_rebase Apr 6, 2026
feiqiangs pushed a commit that referenced this pull request Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants